Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value

نویسندگان

  • Amir-massoud Farahmand
  • Daniel Nikovski
  • Yuji Igarashi
  • Hiroki Konaka
چکیده

We propose a new class of computationally fast algorithms to find close to optimal policy for Markov Decision Processes (MDP) with large finite horizon T. The main idea is that instead of planning until the time horizon T, we plan only up to a truncated horizon H ¡¡ T and use an estimate of the true optimal value function as the terminal value. Our approach of finding the terminal value function is to learn a mapping from an MDP to its value function by solving many similar MDPs during a training phase and fit a regression estimator. We analyze the method by providing an error propagation theorem that shows the effect of various sources of errors to the quality of the solution. We also empirically validate this approach in a realworld application of designing an energy management system for Hybrid Electric Vehicles with promising results. 2016 AAAI Conference on Artificial Intelligence This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c © Mitsubishi Electric Research Laboratories, Inc., 2016 201 Broadway, Cambridge, Massachusetts 02139 Truncated Approximate Dynamic Programming With Task-Dependent Terminal Value Amir-massoud Farahmand and Daniel N. Nikovski Mitsubishi Electric Research Laboratories Cambridge, MA, USA Yuji Igarashi and Hiroki Konaka Mitsubishi Electric Corporation Hyogo 661-8661, Japan

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A max-plus method for the approximate solution of discrete time linear regulator problems with non-quadratic terminal payoff

Efficient Riccati equation based techniques for the approximate solution of discrete time linear regulator problems are restricted in their application to problems with quadratic terminal payoffs. Where non-quadratic terminal payoffs are required, these techniques fail due to the attendant nonquadratic value functions involved. In order to compute these non-quadratic value functions, it is ofte...

متن کامل

Robust inter and intra-cell layouts design model dealing with stochastic dynamic problems

In this paper, a novel quadratic assignment-based mathematical model is developed for concurrent design of robust inter and intra-cell layouts in dynamic stochastic environments of manufacturing systems. In the proposed model, in addition to considering time value of money, the product demands are presumed to be dependent normally distributed random variables with known expectation, variance, a...

متن کامل

Expected Duration of Dynamic Markov PERT Networks

Abstract : In this paper , we apply the stochastic dynamic programming to approximate the mean project completion time in dynamic Markov PERT networks. It is assumed that the activity durations are independent random variables with exponential distributions, but some social and economical problems influence the mean of activity durations. It is also assumed that the social problems evolve in ac...

متن کامل

OPTIMIZATION OF A PRODUCTION LOT SIZING PROBLEM WITH QUANTITY DISCOUNT

Dynamic lot sizing problem is one of the significant problem in industrial units and it has been considered by  many researchers. Considering the quantity discount in  purchasing cost is one of the important and practical assumptions in the field of inventory control models and it has been less focused in terms of stochastic version of dynamic lot sizing problem. In  this paper, stochastic dyn...

متن کامل

Trajectory tracking of under-actuated nonlinear dynamic robots: Adaptive fuzzy hierarchical terminal sliding-mode control

In recent years, underactuated nonlinear dynamic systems trajectory tracking, such as space robots and manipulators with structural flexibility, has become a major field of interest due to the complexity and high computational load of these systems. Hierarchical sliding mode control has been investigated recently for these systems; however, the instability phenomena will possibly occur, especia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016